Goto

Collaborating Authors

 Sakhalin Oblast


Takeda's psoriasis pill developed with AI assistance succeeds in trials

The Japan Times

Takeda's psoriasis pill developed with AI assistance succeeds in trials Psoriasis is a chronic autoimmune disorder that causes rashes marked by itchy, scaly rashes and afflicts more than 125 million people worldwide. Takeda Pharmaceutical announced that its oral psoriasis drug zasocitinib proved safe and effective in late-stage trials, marking a milestone in its effort to treat the incurable skin condition and offset looming revenue pressure. Patients with moderate-to-severe plaque psoriasis who took the once-daily pill showed significantly clearer skin compared with those on placebo or the existing therapy apremilast, the company said in a statement Thursday. Takeda plans to submit data to the U.S. Food and Drug Administration and other regulators beginning in fiscal year 2026. If approved, zasocitinib would join the small but growing oral psoriasis treatments -- long a market dominated by ointments and injectable antibody therapies -- and stand out as one of the first drugs discovered with the help of artificial intelligence.


Russia-Ukraine war: List of key events, day 1,350

Al Jazeera

Is Trump losing patience with Putin? Will sanctions against Russian oil giants hurt Putin? Russian and Ukrainian troops have fought battles in the ruins of Pokrovsk, a transport and logistics hub in eastern Ukraine, with Ukraine's military reporting fierce fighting under way in a part of the city that was key for Kyiv's front-line logistics. Ukrainian President Volodymyr Zelenskyy said he visited troops fighting near the eastern city of Dobropillia, where Ukrainian forces are conducting a counteroffensive against Russian troops. Russia struck civilian energy and port infrastructure in a massive overnight drone attack on Ukraine's southern region of Odesa, the region's governor said in a post on the Telegram messaging app, adding that rescuers extinguished fires and there were no casualties.


Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages

Liang, Siyu, Levow, Gina-Anne

arXiv.org Artificial Intelligence

Automatic Speech Recognition (ASR) has reached impressive accuracy for high-resource languages, yet its utility in linguistic fieldwork remains limited. Recordings collected in fieldwork contexts present unique challenges, including spontaneous speech, environmental noise, and severely constrained datasets from under-documented languages. In this paper, we benchmark the performance of two fine-tuned multilingual ASR models, MMS and XLS-R, on five typologically diverse low-resource languages with control of training data duration. Our findings show that MMS is best suited when extremely small amounts of training data are available, whereas XLS-R shows parity performance once training data exceed one hour. We provide linguistically grounded analysis for further provide insights towards practical guidelines for field linguists, highlighting reproducible ASR adaptation approaches to mitigate the transcription bottleneck in language documentation.


Long Context In-Context Compression by Getting to the Gist of Gisting

Petrov, Aleksandar, Sandler, Mark, Zhmoginov, Andrey, Miller, Nolan, Vladymyrov, Max

arXiv.org Artificial Intelligence

Long context processing is critical for the adoption of LLMs, but existing methods often introduce architectural complexity that hinders their practical adoption. Gisting, an in-context compression method with no architectural modification to the decoder transformer, is a promising approach due to its simplicity and compatibility with existing frameworks. While effective for short instructions, we demonstrate that gisting struggles with longer contexts, with significant performance drops even at minimal compression rates. Surprisingly, a simple average pooling baseline consistently outperforms gisting. We analyze the limitations of gisting, including information flow interruptions, capacity limitations and the inability to restrict its attention to subsets of the context. Motivated by theoretical insights into the performance gap between gisting and average pooling, and supported by extensive experimentation, we propose GistPool, a new in-context compression method. GistPool preserves the simplicity of gisting, while significantly boosting its performance on long context compression tasks.


Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM

Yuan, Dong, Rastogi, Eti, Zhao, Fen, Goyal, Sagar, Naik, Gautam, Rajagopal, Sree Prasanna

arXiv.org Artificial Intelligence

Due to the exponential growth of information and the need for efficient information consumption the task of summarization has gained paramount importance. Evaluating summarization accurately and objectively presents significant challenges, particularly when dealing with long and unstructured texts rich in content. Existing methods, such as ROUGE (Lin, 2004) and embedding similarities, often yield scores that have low correlation with human judgements and are also not intuitively understandable, making it difficult to gauge the true quality of the summaries. LLMs can mimic human in giving subjective reviews but subjective scores are hard to interpret and justify. They can be easily manipulated by altering the models and the tones of the prompts. In this paper, we introduce a novel evaluation methodology and tooling designed to address these challenges, providing a more comprehensive, accurate and interpretable assessment of summarization outputs. Our method (SumAutoEval) proposes and evaluates metrics at varying granularity levels, giving objective scores on 4 key dimensions such as completeness, correctness, Alignment and readability. We empirically demonstrate, that SumAutoEval enhances the understanding of output quality with better human correlation.


CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model

Go, Dongyoung, Whang, Taesun, Lee, Chanhee, Kim, Hwa-Yeon, Park, Sunghoon, Ji, Seunghwan, Kim, Jinho, Kim, Dongchan, Kim, Young-Bum

arXiv.org Artificial Intelligence

The integration of Retrieval-Augmented Generation (RAG) with Multimodal Large Language Models (MLLMs) has revolutionized information retrieval and expanded the practical applications of AI. However, current systems struggle in accurately interpreting user intent, employing diverse retrieval strategies, and effectively filtering unintended or inappropriate responses, limiting their effectiveness. This paper introduces Contextual Understanding and Enhanced Search with MLLM (CUE-M), a novel multimodal search framework that addresses these challenges through a multi-stage pipeline comprising image context enrichment, intent refinement, contextual query generation, external API integration, and relevance-based filtering. CUE-M incorporates a robust filtering pipeline combining image-based, text-based, and multimodal classifiers, dynamically adapting to instance- and category-specific concern defined by organizational policies. Evaluations on a multimodal Q&A dataset and a public safety benchmark demonstrate that CUE-M outperforms baselines in accuracy, knowledge integration, and safety, advancing the capabilities of multimodal retrieval systems.


Automatic Speech Recognition for the Ika Language

Nzenwata, Uchenna, Ogbuigwe, Daniel

arXiv.org Artificial Intelligence

We present a cost-effective approach for developing Automatic Speech Recognition (ASR) models for low-resource languages like Ika. We fine-tune the pretrained wav2vec 2.0 Massively Multilingual Speech Models on a high-quality speech dataset compiled from New Testament Bible translations in Ika. Our results show that fine-tuning multilingual pretrained models achieves a Word Error Rate (WER) of 0.5377 and Character Error Rate (CER) of 0.2651 with just over 1 hour of training data. The larger 1 billion parameter model outperforms the smaller 300 million parameter model due to its greater complexity and ability to store richer speech representations. However, we observe overfitting to the small training dataset, reducing generalizability. Our findings demonstrate the potential of leveraging multilingual pretrained models for low-resource languages. Future work should focus on expanding the dataset and exploring techniques to mitigate overfitting.


Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings

Attia, Ahmed Adel, Demszky, Dorottya, Ogunremi, Tolulope, Liu, Jing, Espy-Wilson, Carol

arXiv.org Artificial Intelligence

Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in adapting Wav2vec2.0 to the classroom domain. We show that CPT is a powerful tool in that regard and reduces the Word Error Rate (WER) of Wav2vec2.0-based models by upwards of 10%. More specifically, CPT improves the model's robustness to different noises, microphones, classroom conditions as well as classroom demographics. Our CPT models show improved ability to generalize to different demographics unseen in the labeled finetuning data.


Early Detection of Bark Beetle Attack Using Remote Sensing and Machine Learning: A Review

Marvasti-Zadeh, Seyed Mojtaba, Goodsman, Devin, Ray, Nilanjan, Erbilgin, Nadir

arXiv.org Artificial Intelligence

This paper provides a comprehensive review of past and current advances in the early detection of bark beetle-induced tree mortality from three primary perspectives: bark beetle & host interactions, RS, and ML/DL. In contrast to prior efforts, this review encompasses all RS systems and emphasizes ML/DL methods to investigate their strengths and weaknesses. We parse existing literature based on multi- or hyper-spectral analyses and distill their knowledge based on: bark beetle species & attack phases with a primary emphasis on early stages of attacks, host trees, study regions, RS platforms & sensors, spectral/spatial/temporal resolutions, spectral signatures, spectral vegetation indices (SVIs), ML approaches, learning schemes, task categories, models, algorithms, classes/clusters, features, and DL networks & architectures. Although DL-based methods and the random forest (RF) algorithm showed promising results, highlighting their potential to detect subtle changes across visible, thermal, and short-wave infrared (SWIR) spectral regions, they still have limited effectiveness and high uncertainties. To inspire novel solutions to these shortcomings, we delve into the principal challenges & opportunities from different perspectives, enabling a deeper understanding of the current state of research and guiding future research directions.


NLG Evaluation Metrics Beyond Correlation Analysis: An Empirical Metric Preference Checklist

Ni'mah, Iftitahu, Fang, Meng, Menkovski, Vlado, Pechenizkiy, Mykola

arXiv.org Artificial Intelligence

In this study, we analyze automatic evaluation metrics for Natural Language Generation (NLG), specifically task-agnostic metrics and human-aligned metrics. Task-agnostic metrics, such as Perplexity, BLEU, BERTScore, are cost-effective and highly adaptable to diverse NLG tasks, yet they have a weak correlation with human. Human-aligned metrics (CTC, CtrlEval, UniEval) improves correlation level by incorporating desirable human-like qualities as training objective. However, their effectiveness at discerning system-level performance and quality of system outputs remain unclear. We present metric preference checklist as a framework to assess the effectiveness of automatic metrics in three NLG tasks: Text Summarization, Dialogue Response Generation, and Controlled Generation. Our proposed framework provides access: (i) for verifying whether automatic metrics are faithful to human preference, regardless of their correlation level to human; and (ii) for inspecting the strengths and limitations of NLG systems via pairwise evaluation. We show that automatic metrics provide a better guidance than human on discriminating system-level performance in Text Summarization and Controlled Generation tasks. We also show that multi-aspect human-aligned metric (UniEval) is not necessarily dominant over single-aspect human-aligned metrics (CTC, CtrlEval) and task-agnostic metrics (BLEU, BERTScore), particularly in Controlled Generation tasks.